JASIS Forthcoming –Jiangping Chen A Lexical Knowledge Base Approach for English-Chinese Cross Language Information Retrieval

نویسنده

  • Jiangping Chen
چکیده

This study proposes and explores a natural language processing (NLP) based strategy to address out-ofdictionary and vocabulary mismatch problems in query translation based English-Chinese Cross Language Information Retrieval (EC-CLIR). The strategy, named the LKB approach, is to construct a lexical knowledge base (LKB) and to use it for query translation. This paper describes the LKB construction process, which customizes available translation resources based on the document collection of the EC-CLIR system. The evaluation shows that the LKB approach is very promising. It consistently increased the percentage of correct translations and decreased the percentage of missing translations in addition to effectively detecting the vocabulary gap between the document collection and the translation resource of the system. The comparative analysis of the top EC-CLIR results using the LKB and two other translation resources demonstrates that the LKB approach has produced significant improvement in EC-CLIR performance compared to performance using the original translation resource without customization. It has also achieved the same level of performance as a sophisticated machine translation system. The study concludes that the LKB approach has the potential to be an empirical model for developing real-world CLIR systems. Linguistic knowledge and NLP techniques, if appropriately used, can improve the effectiveness of EC-CLIR.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A lexical knowledge base approach for English-Chinese cross-language information retrieval

the LKB approach, is to construct a lexical knowledge base (LKB) and to use it for query translation. In this article, the author describes the LKB construction process, which customizes available translation resources based on the document collection of the EC-CLIR system. The evaluation shows that the LKB approach is very promising. It consistently increased the percentage of correct translat...

متن کامل

A High-Accurate Chinese-English NE Backward Translation System Combining Both Lexical Information and Web Statistics

Named entity translation is indispensable in cross language information retrieval nowadays. We propose an approach of combining lexical information, web statistics, and inverse search based on Google to backward translate a Chinese named entity (NE) into English. Our system achieves a high Top-1 accuracy of 87.6%, which is a relatively good performance reported in this area until present.

متن کامل

Construction of a Chinese-english Verb Lexicon for Embedded Machine Translation in Cross-language Information Retrieval

This paper addresses the problem of automatic acquisition of lexical knowledge for rapid construction of MT engines multilingual applications. We describe new techniques for large-scale construction of a Chinese-English verb lexicon and we evaluate the coverage and eeectiveness of the resulting lexicon for a structured MT approach that is embedded in a cross-language information retrieval syste...

متن کامل

Chinese QA and CLQA: NTCIR-5 QA Experiments at UNT

This paper describes our participation in the NTCIR-5 CLQA task. Three runs were officially submitted for three subtasks: Chinese Question Answering, English-Chinese Question Answering, and Chinese-English Question Answering. We expanded our TREC experimental QA system EagleQA this year to include Chinese QA and Cross-Language QA capabilities. Various information retrieval and natural language ...

متن کامل

Research on Lucene-based English-Chinese Cross-Language Information Retrieval

In this paper, we present our English-Chinese Cross-Language Information Retrieval (CLIR) system. We focus our attention on finding effective translation equivalents between English and Chinese, and improving the performance of Chinese IR. On English-Chinese CLIR, we adopt query translation as the dominant strategy, and utilize English-Chinese bilingual dictionary as the important knowledge res...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005